Spectral biclustering of microarray data: coclustering genes and conditions.

نویسندگان

  • Yuval Kluger
  • Ronen Basri
  • Joseph T Chang
  • Mark Gerstein
چکیده

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find "marker genes" that are differentially expressed in particular sets of "conditions." We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری خوشه‌بندی دوبعدی با روش «زیرماتریس‌های با میانگین- درایه‌های بزرگ» در داده‌های بیان ژنی حاصل از ریزآرایه‌های DNA

Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...

متن کامل

Spectral biclustering of microarray cancer data: co-clustering genes and conditions

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to simultaneously find "marker genes" that are differentially expressed in particular “conditions”. We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of ...

متن کامل

A New Survey on Biclustering of Microarray Data

There are subsets of genes that have similar behavior under subsets of conditions, so we say that they coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of utmost importance to make a simultaneous clustering of genes and conditions to ide...

متن کامل

Analysis of a Gibbs sampler method for model-based clustering of gene expression data

MOTIVATION Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis...

متن کامل

A Comparative Study of Clustering and Biclustering of Microarray Data

There are subsets of genes that have similar behavior under subsets of conditions, so we say that they coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of utmost importance to make a simultaneous clustering of genes and conditions to ide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 13 4  شماره 

صفحات  -

تاریخ انتشار 2003